Identifying Repetitive Portions of Clinical Notes and Generating Summaries Pertinent to Treatment of a Patient Based on the Identified Repetitive Portions

ABSTRACT

A mechanism is provided in a data processing system comprising a processor and a memory, the memory comprising instructions that are executed by the processor to specifically configure the processor to implement a repetitive portion identification and weighting engine. A machine learning model is trained for weighting repetitive portions of patient electronic medical records (EMRs). A repetitive portion identification component applies a plurality of templates to clinical notes of a patient EMR to identify one or more candidate portions that match at least one of the plurality of templates. A content analysis component performs content analysis on the one or more candidate portions to determine whether each given candidate portion is relevant. A weighting component assigns a relative weight to each given candidate portion based on relevance. A cognitive summary graphical user interface (GUI) generation component generates cognitive summary reflecting at least a subset of the one or more candidate portions of the patient EMR. The mechanism outputs the cognitive summary in a GUI to a user.

BACKGROUND

The present application relates generally to an improved data processingapparatus and method and more specifically to mechanisms for identifyingrepetitive portions of clinical notes and generating summaries pertinentto treatment of a patient based on the identified repetitive portions.

An electronic health record (EHR) or electronic medical record (EMR) isthe systematized collection of patient and populationelectronically-stored health information in a digital format. Theserecords can be shared across different health care settings. Records areshared through network-connected, enterprise-wide information systems orother information networks and exchanges. EMRs may include a range ofdata, including demographics, social history, medical history,medication and allergies, immunization status, laboratory test results,radiology images, vital signs, personal statistics like age and weight,and billing information.

EMR systems are designed to store data accurately and to capture thestate of a patient across time. It eliminates the need to track down apatient's previous paper medical records and assists in ensuring data isaccurate and legible. It can reduce risk of data replication as there isonly one modifiable file, which means the file is more likely up todate, and decreases risk of lost paperwork. Due to the digitalinformation being searchable and in a single file, EMRs are moreeffective when extracting medical data for the examination of possibletrends and long term changes in a patient. Population-based studies ofmedical records may also be facilitated by the widespread adoption ofEMRs.

SUMMARY

This Summary is provided to introduce a selection of concepts in asimplified form that are further described herein in the DetailedDescription. This Summary is not intended to identify key factors oressential features of the claimed subject matter, nor is it intended tobe used to limit the scope of the claimed subject matter.

In one illustrative embodiment, a method is provided in a dataprocessing system comprising a processor and a memory, the memorycomprising instructions that are executed by the processor tospecifically configure the processor to implement a repetitive portionidentification and weighting engine. The method comprises training amachine learning model for weighting repetitive portions of patientelectronic medical records (EMRs). The method further comprisesapplying, by a repetitive portion identification component executingwithin the repetitive portion identification and weighting engine, aplurality of templates to clinical notes of a patient EMR to identifyone or more candidate portions that match at least one of the pluralityof templates. The method further comprises performing, by a contentanalysis component executing within the repetitive portionidentification and weighting engine, content analysis on the one or morecandidate portions to determine whether each given candidate portion isrelevant. The method further comprises assigning, by a weightingcomponent executing within the repetitive portion identification andweighting engine, a relative weight to each given candidate portionbased on relevance. The method further comprises generating, by acognitive summary graphical user interface (GUI) generation componentexecuting within the repetitive portion identification and weightingengine, cognitive summary reflecting at least a subset of the one ormore candidate portions of the patient EMR. The method further comprisesoutputting the cognitive summary in a GUI to a user.

In other illustrative embodiments, a computer program product comprisinga computer useable or readable medium having a computer readable programis provided. The computer readable program, when executed on a computingdevice, causes the computing device to perform various ones of, andcombinations of, the operations outlined above with regard to the methodillustrative embodiment.

In yet another illustrative embodiment, a system/apparatus is provided.The system/apparatus may comprise one or more processors and a memorycoupled to the one or more processors. The memory may compriseinstructions which, when executed by the one or more processors, causethe one or more processors to perform various ones of, and combinationsof, the operations outlined above with regard to the method illustrativeembodiment.

These and other features and advantages of the present invention will bedescribed in, or will become apparent to those of ordinary skill in theart in view of, the following detailed description of the exampleembodiments of the present invention.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention, as well as a preferred mode of use and further objectivesand advantages thereof, will best be understood by reference to thefollowing detailed description of illustrative embodiments when read inconjunction with the accompanying drawings, wherein:

FIG. 1 depicts a schematic diagram of one illustrative embodiment of acognitive healthcare system in a computer network;

FIG. 2 is a block diagram of an example data processing system in whichaspects of the illustrative embodiments are implemented;

FIG. 3 is an example diagram illustrating an interaction of elements ofa healthcare cognitive system in accordance with one illustrativeembodiment;

FIG. 4 is a block diagram of a repetitive portion identification andweighting engine in accordance with an illustrative embodiment;

FIG. 5 illustrates operation of augmenting key aspects of clinical noteswith additional features based on external resources in accordance withan illustrative embodiment; and

FIG. 6 is a flowchart illustrating operation of a mechanism forrepetitive portion identification and weighting in accordance with anillustrative embodiment.

DETAILED DESCRIPTION

Many times when a doctor is documenting an encounter with a patient, thedoctor, in entering a clinical note into the patient record, may repeatportions of previous notes that may or may not be relevant to thecurrent encounter. Moreover, for some kinds of repetitious information,repetition does not in fact indicate less accuracy or importance but,rather, routine important elements of clinical notes, e.g., temperatureand blood pressure readings, particular laboratory results, etc. Thus,discriminating between irrelevant repetitive information and importantrepetitive information is context-dependent and difficult for automatedsystems to accomplish.

The illustrative embodiments provide a mechanism for identifying andevaluating repetitive content in clinical notes to determine theirrelative importance in conveying information to a medical professional,such as a doctor, about the patient's condition and/or treatment in asummary view of the patient's electronic medical record (EMR) via acognitive system. That is, when providing a cognitive system thatsummarizes the most relevant portions of a patient's EMR for use by adoctor, the mechanism of the illustrative embodiments is able to discern(1) what portions of a clinical note are repetitive of other clinicalnotes or portions of the EMR; (2) the nature of the information conveyedin the repetitive portion relative to the treatment of the patient as awhole and/or the particular encounter for which the clinical note wasgenerated; and, (3) based on the nature of the information conveyed,whether that repetitive portion should be weighted higher or lower thanother portions of the EMR when generating a summary of the relevantportions of the EMR.

Before beginning the discussion of the various aspects of theillustrative embodiments in more detail, it should first be appreciatedthat throughout this description the term “mechanism” will be used torefer to elements of the present invention that perform variousoperations, functions, and the like. A “mechanism,” as the term is usedherein, may be an implementation of the functions or aspects of theillustrative embodiments in the form of an apparatus, a procedure, or acomputer program product. In the case of a procedure, the procedure isimplemented by one or more devices, apparatus, computers, dataprocessing systems, or the like. In the case of a computer programproduct, the logic represented by computer code or instructions embodiedin or on the computer program product is executed by one or morehardware devices in order to implement the functionality or perform theoperations associated with the specific “mechanism.” Thus, themechanisms described herein may be implemented as specialized hardware,software executing on general purpose hardware, software instructionsstored on a medium such that the instructions are readily executable byspecialized or general purpose hardware, a procedure or method forexecuting the functions, or a combination of any of the above.

The present description and claims may make use of the terms “a”, “atleast one of”, and “one or more of” with regard to particular featuresand elements of the illustrative embodiments. It should be appreciatedthat these terms and phrases are intended to state that there is atleast one of the particular feature or element present in the particularillustrative embodiment, but that more than one can also be present.That is, these terms/phrases are not intended to limit the descriptionor claims to a single feature/element being present or require that aplurality of such features/elements be present. To the contrary, theseterms/phrases only require at least a single feature/element with thepossibility of a plurality of such features/elements being within thescope of the description and claims.

Moreover, it should be appreciated that the use of the term “engine,” ifused herein with regard to describing embodiments and features of theinvention, is not intended to be limiting of any particularimplementation for accomplishing and/or performing the actions, steps,processes, etc., attributable to and/or performed by the engine. Anengine may be, but is not limited to, software, hardware and/or firmwareor any combination thereof that performs the specified functionsincluding, but not limited to, any use of a general and/or specializedprocessor in combination with appropriate software loaded or stored in amachine readable memory and executed by the processor. Further, any nameassociated with a particular engine is, unless otherwise specified, forpurposes of convenience of reference and not intended to be limiting toa specific implementation. Additionally, any functionality attributed toan engine may be equally performed by multiple engines, incorporatedinto and/or combined with the functionality of another engine of thesame or different type, or distributed across one or more engines ofvarious configurations.

In addition, it should be appreciated that the following descriptionuses a plurality of various examples for various elements of theillustrative embodiments to further illustrate example implementationsof the illustrative embodiments and to aid in the understanding of themechanisms of the illustrative embodiments. These examples are intendedto be non-limiting and are not exhaustive of the various possibilitiesfor implementing the mechanisms of the illustrative embodiments. It willbe apparent to those of ordinary skill in the art in view of the presentdescription that there are many other alternative implementations forthese various elements that may be utilized in addition to, or inreplacement of, the examples provided herein without departing from thespirit and scope of the present invention.

The present invention may be a system, a method, and/or a computerprogram product. The computer program product may include a computerreadable storage medium (or media) having computer readable programinstructions thereon for causing a processor to carry out aspects of thepresent invention.

The computer readable storage medium can be a tangible device that canretain and store instructions for use by an instruction executiondevice. The computer readable storage medium may be, for example, but isnot limited to, an electronic storage device, a magnetic storage device,an optical storage device, an electromagnetic storage device, asemiconductor storage device, or any suitable combination of theforegoing. A non-exhaustive list of more specific examples of thecomputer readable storage medium includes the following: a portablecomputer diskette, a hard disk, a random access memory (RAM), aread-only memory (ROM), an erasable programmable read-only memory (EPROMor Flash memory), a static random access memory (SRAM), a portablecompact disc read-only memory (CD-ROM), a digital versatile disk (DVD),a memory stick, a floppy disk, a mechanically encoded device such aspunch-cards or raised structures in a groove having instructionsrecorded thereon, and any suitable combination of the foregoing. Acomputer readable storage medium, as used herein, is not to be construedas being transitory signals per se, such as radio waves or other freelypropagating electromagnetic waves, electromagnetic waves propagatingthrough a waveguide or other transmission media (e.g., light pulsespassing through a fiber-optic cable), or electrical signals transmittedthrough a wire.

Computer readable program instructions described herein can bedownloaded to respective computing/processing devices from a computerreadable storage medium or to an external computer or external storagedevice via a network, for example, the Internet, a local area network, awide area network and/or a wireless network. The network may comprisecopper transmission cables, optical transmission fibers, wirelesstransmission, routers, firewalls, switches, gateway computers and/oredge servers. A network adapter card or network interface in eachcomputing/processing device receives computer readable programinstructions from the network and forwards the computer readable programinstructions for storage in a computer readable storage medium withinthe respective computing/processing device.

Computer readable program instructions for carrying out operations ofthe present invention may be assembler instructions,instruction-set-architecture (ISA) instructions, machine instructions,machine dependent instructions, microcode, firmware instructions,state-setting data, or either source code or object code written in anycombination of one or more programming languages, including an objectoriented programming language such as Java, Smalltalk, C++ or the like,and conventional procedural programming languages, such as the “C”programming language or similar programming languages. The computerreadable program instructions may execute entirely on the user'scomputer, partly on the user's computer, as a stand-alone softwarepackage, partly on the user's computer and partly on a remote computeror entirely on the remote computer or server. In the latter scenario,the remote computer may be connected to the user's computer through anytype of network, including a local area network (LAN) or a wide areanetwork (WAN), or the connection may be made to an external computer(for example, through the Internet using an Internet Service Provider).In some embodiments, electronic circuitry including, for example,programmable logic circuitry, field-programmable gate arrays (FPGA), orprogrammable logic arrays (PLA) may execute the computer readableprogram instructions by utilizing state information of the computerreadable program instructions to personalize the electronic circuitry,in order to perform aspects of the present invention.

Aspects of the present invention are described herein with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems), and computer program products according to embodiments of theinvention. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer readable program instructions.

These computer readable program instructions may be provided to aprocessor of a general purpose computer, special purpose computer, orother programmable data processing apparatus to produce a machine, suchthat the instructions, which execute via the processor of the computeror other programmable data processing apparatus, create means forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks. These computer readable program instructionsmay also be stored in a computer readable storage medium that can directa computer, a programmable data processing apparatus, and/or otherdevices to function in a particular manner, such that the computerreadable storage medium having instructions stored therein comprises anarticle of manufacture including instructions which implement aspects ofthe function/act specified in the flowchart and/or block diagram blockor blocks.

The computer readable program instructions may also be loaded onto acomputer, other programmable data processing apparatus, or other deviceto cause a series of operational steps to be performed on the computer,other programmable apparatus or other device to produce a computerimplemented process, such that the instructions which execute on thecomputer, other programmable apparatus, or other device implement thefunctions/acts specified in the flowchart and/or block diagram block orblocks.

The flowchart and block diagrams in the Figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods, and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof instructions, which comprises one or more executable instructions forimplementing the specified logical function(s). In some alternativeimplementations, the functions noted in the block may occur out of theorder noted in the figures. For example, two blocks shown in successionmay, in fact, be executed substantially concurrently, or the blocks maysometimes be executed in the reverse order, depending upon thefunctionality involved. It will also be noted that each block of theblock diagrams and/or flowchart illustration, and combinations of blocksin the block diagrams and/or flowchart illustration, can be implementedby special purpose hardware-based systems that perform the specifiedfunctions or acts or carry out combinations of special purpose hardwareand computer instructions.

As noted above, the present invention provides mechanisms for graphicalpresentation of relevant information from electronic medical records.The illustrative embodiments may be utilized in many different types ofdata processing environments. In order to provide a context for thedescription of the specific elements and functionality of theillustrative embodiments, FIGS. 1-3 are provided hereafter as exampleenvironments in which aspects of the illustrative embodiments may beimplemented. It should be appreciated that FIGS. 1-3 are only examplesand are not intended to assert or imply any limitation with regard tothe environments in which aspects or embodiments of the presentinvention may be implemented. Many modifications to the depictedenvironments may be made without departing from the spirit and scope ofthe present invention.

FIGS. 1-3 are directed to describing an example cognitive system forhealthcare applications (also referred to herein as a “healthcarecognitive system”) which implements a request processing pipeline, suchas a Question Answering (QA) pipeline (also referred to as aQuestion/Answer pipeline or Question and Answer pipeline) for example,request processing methodology, and request processing computer programproduct with which the mechanisms of the illustrative embodiments areimplemented. These requests may be provided as structured orunstructured request messages, natural language questions, or any othersuitable format for requesting an operation to be performed by thehealthcare cognitive system. As described in more detail hereafter, theparticular healthcare application that is implemented in the cognitivesystem of the present invention is a healthcare application forpresenting relevant information using a graphical presentation engine.

It should be appreciated that the healthcare cognitive system, whileshown as having a single request processing pipeline in the exampleshereafter, may in fact have multiple request processing pipelines. Eachrequest processing pipeline may be separately trained and/or configuredto process requests associated with different domains or be configuredto perform the same or different analysis on input requests (orquestions in implementations using a QA pipeline), depending on thedesired implementation. For example, in some cases, a first requestprocessing pipeline may be trained to operate on input requests directedto a first medical malady domain (e.g., various types of blood diseases)while another request processing pipeline may be trained to answer inputrequests in another medical malady domain (e.g., various types ofcancers). In other cases, for example, the request processing pipelinesmay be configured to provide different types of cognitive functions orsupport different types of healthcare applications, such as one requestprocessing pipeline being used for patient diagnosis, another requestprocessing pipeline being configured for cognitive analysis of EMR data,another request processing pipeline being configured for patientmonitoring, etc.

Moreover, each request processing pipeline may have their own associatedcorpus or corpora that they ingest and operate on, e.g., one corpus forblood disease domain documents and another corpus for cancer diagnosticsdomain related documents in the above examples. These corpora mayinclude, but are not limited to, EMR data. The cognitive system maydetermine what portions of a clinical note are repetitive of otherclinical notes or portions of the EMR, the nature of the informationconveyed in the repetitive portion relative to the treatment of thepatient as a whole and/or the particular encounter for which theclinical note was generated, and based on the nature of the informationconveyed, whether that repetitive portion should be weighted higher orlower than other portions of the EMR when generating a summary of therelevant portions of the EMR.

As will be discussed in greater detail hereafter, the illustrativeembodiments may be integrated in, augment, and extend the functionalityof these QA pipeline, or request processing pipeline, mechanisms of ahealthcare cognitive system with regard to an electronic medical recordcompleteness and data quality assessment mechanism.

Thus, it is important to first have an understanding of how cognitivesystems and question and answer creation in a cognitive systemimplementing a QA pipeline is implemented before describing how themechanisms of the illustrative embodiments are integrated in and augmentsuch cognitive systems and request processing pipeline, or QA pipeline,mechanisms. It should be appreciated that the mechanisms described inFIGS. 1-3 are only examples and are not intended to state or imply anylimitation with regard to the type of cognitive system mechanisms withwhich the illustrative embodiments are implemented. Many modificationsto the example cognitive system shown in FIGS. 1-3 may be implemented invarious embodiments of the present invention without departing from thespirit and scope of the present invention.

FIG. 1 depicts a schematic diagram of one illustrative embodiment of acognitive system 100 implementing a request processing pipeline 108 in acomputer network 102. The cognitive system 100 is implemented on one ormore computing devices 104A-C (comprising one or more processors and oneor more memories, and potentially any other computing device elementsgenerally known in the art including buses, storage devices,communication interfaces, and the like) connected to the computernetwork 102. For purposes of illustration only, FIG. 1 depicts thecognitive system 100 being implemented on computing device 104A only,but as noted above the cognitive system 100 may be distributed acrossmultiple computing devices, such as a plurality of computing devices104A-C. The network 102 includes multiple computing devices 104A-C,which may operate as server computing devices, and 110-112 which mayoperate as client computing devices, in communication with each otherand with other devices or components via one or more wired and/orwireless data communication links, where each communication linkcomprises one or more of wires, routers, switches, transmitters,receivers, or the like. In some illustrative embodiments, the cognitivesystem 100 and network 102 may provide cognitive operations including,but not limited to, request processing and cognitive response generationwhich may take many different forms depending upon the desiredimplementation, e.g., cognitive information retrieval,training/instruction of users, cognitive evaluation of data, or thelike. Other embodiments of the cognitive system 100 may be used withcomponents, systems, sub-systems, and/or devices other than those thatare depicted herein.

The cognitive system 100 is configured to implement a request processingpipeline 108 that receive inputs from various sources. The requests maybe posed in the form of a natural language question, natural languagerequest for information, natural language request for the performance ofa cognitive operation, or the like, and the answer may be returned in anatural language format maximized for efficient comprehension in apoint-of-care clinical setting. For example, the cognitive system 100receives input from the network 102, a corpus or corpora of electronicdocuments 106, cognitive system users, and/or other data and otherpossible sources of input. In one embodiment, some or all of the inputsto the cognitive system 100 are routed through the network 102. Thevarious computing devices 104A-C on the network 102 include accesspoints for content creators and cognitive system users. Some of thecomputing devices 104A-C include devices for a database storing thecorpus or corpora of data 106 (which is shown as a separate entity inFIG. 1 for illustrative purposes only). Portions of the corpus orcorpora of data 106 may also be provided on one or more other networkattached storage devices, in one or more databases, or other computingdevices not explicitly shown in FIG. 1. The network 102 includes localnetwork connections and remote connections in various embodiments, suchthat the cognitive system 100 may operate in environments of any size,including local and global, e.g., the Internet.

In one embodiment, the content creator creates content in a document ofthe corpus or corpora of data 106 for use as part of a corpus of datawith the cognitive system 100. The document includes any file, text,article, or source of data for use in the cognitive system 100.Cognitive system users access the cognitive system 100 via a networkconnection or an Internet connection to the network 102, and inputquestions/requests to the cognitive system 100 that areanswered/processed based on the content in the corpus or corpora of data106. In one embodiment, the questions/requests are formed using naturallanguage. The cognitive system 100 parses and interprets thequestion/request via a pipeline 108, and provides a response to thecognitive system user, e.g., cognitive system user 110, containing oneor more answers to the question posed, response to the request, resultsof processing the request, or the like. In some embodiments, thecognitive system 100 provides a response to users in a ranked list ofcandidate answers/responses while in other illustrative embodiments, thecognitive system 100 provides a single final answer/response or acombination of a final answer/response and ranked listing of othercandidate answers/responses.

The cognitive system 100 implements the pipeline 108 which comprises aplurality of stages for processing an input question/request based oninformation obtained from the corpus or corpora of data 106. Thepipeline 108 generates answers/responses for the input question orrequest based on the processing of the input question/request and thecorpus or corpora of data 106.

In some illustrative embodiments, the cognitive system 100 may be theIBM Watson™ cognitive system available from International BusinessMachines Corporation of Armonk, N.Y., which is augmented with themechanisms of the illustrative embodiments described hereafter. Asoutlined previously, a pipeline of the IBM Watson™ cognitive systemreceives an input question or request which it then parses to extractthe major features of the question/request, which in turn are then usedto formulate queries that are applied to the corpus or corpora of data106. Based on the application of the queries to the corpus or corpora ofdata 106, a set of hypotheses, or candidate answers/responses to theinput question/request, are generated by looking across the corpus orcorpora of data 106 for portions of the corpus or corpora of data 106(hereafter referred to simply as the corpus 106) that have somepotential for containing a valuable response to the inputquestion/response (hereafter assumed to be an input question). Thepipeline 108 of the IBM Watson™ cognitive system then performs deepanalysis on the language of the input question and the language used ineach of the portions of the corpus 106 found during the application ofthe queries using a variety of reasoning algorithms.

The scores obtained from the various reasoning algorithms are thenweighted against a statistical model that summarizes a level ofconfidence that the pipeline 108 of the IBM Watson™ cognitive system100, in this example, has regarding the evidence that the potentialcandidate answer is inferred by the question. This process is berepeated for each of the candidate responses to generate ranked listingof candidate responses, which may then be presented to the user thatsubmitted the input request, e.g., a user of client computing device110, or from which a final response is selected and presented to theuser. More information about the pipeline 108 of the IBM Watson™cognitive system 100 may be obtained, for example, from the IBMCorporation website, IBM Redbooks, and the like. For example,information about the pipeline of the IBM Watson™ cognitive system canbe found in Yuan et al., “Watson and Healthcare,” IBM developerWorks,2011 and “The Era of Cognitive Systems: An Inside Look at IBM Watson andHow it Works” by Rob High, IBM Redbooks, 2012.

As noted above, while the input to the cognitive system 100 from aclient device may be posed in the form of a natural language request,the illustrative embodiments are not limited to such. Rather, the inputrequest may in fact be formatted or structured as any suitable type ofrequest which may be parsed and analyzed using structured and/orunstructured input analysis, including but not limited to the naturallanguage parsing and analysis mechanisms of a cognitive system such asIBM Watson™, to determine the basis upon which to perform cognitiveanalysis and providing a result of the cognitive analysis. In the caseof a healthcare based cognitive system, this analysis may involveprocessing patient medical records, medical guidance documentation fromone or more corpora, and the like, to provide a healthcare orientedcognitive system result.

In the context of the present invention, cognitive system 100 mayprovide a cognitive functionality for assisting with healthcare basedoperations. For example, depending upon the particular implementation,the healthcare based operations may comprise patient diagnostics medicalpractice management systems, personal patient care plan generation andmonitoring, patient electronic medical record (EMR) evaluation forvarious purposes, such as for identifying patients that are suitable fora medical trial or a particular type of medical treatment, or the like.Thus, the cognitive system 100 may be a healthcare cognitive system 100that operates in the medical or healthcare type domains and which mayprocess requests for such healthcare operations via the requestprocessing pipeline 108 input as either structured or unstructuredrequests, natural language input questions, or the like.

As shown in FIG. 1, the cognitive system 100 is further augmented, inaccordance with the mechanisms of the illustrative embodiments, toinclude logic implemented in specialized hardware, software executed onhardware, or any combination of specialized hardware and softwareexecuted on hardware, for implementing a repetitive portionidentification and weighting engine 120 for identifying repetitiveportions of clinical notes or portions of a patient EMR, analyzing thecontent of the repetitive portions, and weighting or ranking therepetitive portions for inclusion in a summary of the relevant portionsof the patient EMR.

Repetitive portion identification and weighting engine 120 is providedwith templates or patterns of natural language content that typicallyexist in a clinical note. In some embodiments, these templates orpatterns for a particular doctor or medical professional may be learnedover time through analysis of his or her particular style of enteringinformation into clinical notes of the patient EMRs. Repetitive portionidentification and weighting engine 120 applies these templates/patternsto clinical notes of a patient EMR to identify portions of the clinicalnotes that match the templates/patterns and, thus, are candidates forclassification as repetitive content. Repetitive portion identificationand weighting engine 120 compares content of the candidate portions tothe patient's overall current medical condition and the treatmentsprescribed to determine whether the repetitive portion content ispertinent to the patient's overall medical condition or the reason forthe patient's scheduled encounter with the medical professional.Repetitive portion identification and weighting engine 120 weights orranks the candidate repetitive portions based on the degree of relevanceto the context of the encounter.

It should be appreciated that the repetitive content may be a portion ofa clinical note such that the evaluation of a single clinical note mayresult in different portions having different relative weights. Forexample, a doctor may copy an old entry in the patient's EMR and thenadd additional content pertinent to the particular encounter. As aresult, repetitive portion identification and weighting engine 120 mayassign a higher weight to the new content that was added and assign alower weight to the repetitive content.

The relative weights of portions of clinical notes may be used toidentify which portions of clinical notes should be reflected insummarizations generated by the cognitive system, which provides asummary graphical user interface (GUI) or dashboard representation ofthe patient's EMR that is directed to the current medical condition andtreatment of the patient, anticipates the questions that the medicalprofessional is likely to ask about the patient, and provides thecorresponding answers from the patient's EMR. The mechanisms of theillustrative embodiments provide underlying functionality to facilitatethe presentation of this summary representation by identifying whichportions of the clinical notes are more important than others that areconsidered repetitious.

The key aspects of the clinical notes can be augmented with additionalfeatures based on external resources, such as unlabeled data andexternal rule-based systems related to similar data. Because manuallabeling is very tedious and expensive, the availability of labeled dataand manually constructed features is limited. The illustrativeembodiments expand the training data features by automaticallydiscovering correlations between the training data and features in theexternal resources. Hence, aspects related to the patient from socialmedia streams or aspects related to the patient diagnosis from aresearch corpus of data are identified and weighted to inform themedical professional and to keep the clinical notes augmented andup-to-date with the patient's latest context. The additional featuresmay lead the medical professional to ask clarification questions, andcomments added to the automatically identified features may trigger theincrease of weights of those features in proportion to the number ofpatients that feature was pursued for.

As noted above, the mechanisms of the illustrative embodiments arerooted in the computer technology arts and are implemented using logicpresent in such computing or data processing systems. These computing ordata processing systems are specifically configured, either throughhardware, software, or a combination of hardware and software, toimplement the various operations described above. As such, FIG. 2 isprovided as an example of one type of data processing system in whichaspects of the present invention may be implemented. Many other types ofdata processing systems may be likewise configured to specificallyimplement the mechanisms of the illustrative embodiments.

FIG. 2 is a block diagram of an example data processing system in whichaspects of the illustrative embodiments are implemented. Data processingsystem 200 is an example of a computer, such as server 104 or client 110in FIG. 1, in which computer usable code or instructions implementingthe processes for illustrative embodiments of the present invention arelocated. In one illustrative embodiment, FIG. 2 represents a servercomputing device, such as a server 104, which, which implements acognitive system 100 and QA system pipeline 108 augmented to include theadditional mechanisms of the illustrative embodiments describedhereafter.

In the depicted example, data processing system 200 employs a hubarchitecture including North Bridge and Memory Controller Hub (NB/MCH)202 and South Bridge and Input/Output (I/O) Controller Hub (SB/ICH) 204.Processing unit 206, main memory 208, and graphics processor 210 areconnected to NB/MCH 202. Graphics processor 210 is connected to NB/MCH202 through an accelerated graphics port (AGP).

In the depicted example, local area network (LAN) adapter 212 connectsto SB/ICH 204. Audio adapter 216, keyboard and mouse adapter 220, modem222, read only memory (ROM) 224, hard disk drive (HDD) 226, CD-ROM drive230, universal serial bus (USB) ports and other communication ports 232,and PCI/PCIe devices 234 connect to SB/ICH 204 through bus 238 and bus240. PCI/PCIe devices may include, for example, Ethernet adapters,add-in cards, and PC cards for notebook computers. PCI uses a card buscontroller, while PCIe does not. ROM 224 may be, for example, a flashbasic input/output system (BIOS).

HDD 226 and CD-ROM drive 230 connect to SB/ICH 204 through bus 240. HDD226 and CD-ROM drive 230 may use, for example, an integrated driveelectronics (IDE) or serial advanced technology attachment (SATA)interface. Super I/O (SIO) device 236 is connected to SB/ICH 204.

An operating system runs on processing unit 206. The operating systemcoordinates and provides control of various components within the dataprocessing system 200 in FIG. 2. As a client, the operating system is acommercially available operating system such as Microsoft® Windows 10®.An object-oriented programming system, such as the Java™ programmingsystem, may run in conjunction with the operating system and providescalls to the operating system from Java™ programs or applicationsexecuting on data processing system 200.

As a server, data processing system 200 may be, for example, an IBM®eServer™ System p® computer system, running the Advanced InteractiveExecutive (AIX®) operating system or the LINUX® operating system. Dataprocessing system 200 may be a symmetric multiprocessor (SMP) systemincluding a plurality of processors in processing unit 206.Alternatively, a single processor system may be employed.

Instructions for the operating system, the object-oriented programmingsystem, and applications or programs are located on storage devices,such as HDD 226, and are loaded into main memory 208 for execution byprocessing unit 206. The processes for illustrative embodiments of thepresent invention are performed by processing unit 206 using computerusable program code, which is located in a memory such as, for example,main memory 208, ROM 224, or in one or more peripheral devices 226 and230, for example.

A bus system, such as bus 238 or bus 240 as shown in FIG. 2, iscomprised of one or more buses. Of course, the bus system may beimplemented using any type of communication fabric or architecture thatprovides for a transfer of data between different components or devicesattached to the fabric or architecture. A communication unit, such asmodem 222 or network adapter 212 of FIG. 2, includes one or more devicesused to transmit and receive data. A memory may be, for example, mainmemory 208, ROM 224, or a cache such as found in NB/MCH 202 in FIG. 2.

Those of ordinary skill in the art will appreciate that the hardwaredepicted in FIGS. 1 and 2 may vary depending on the implementation.Other internal hardware or peripheral devices, such as flash memory,equivalent non-volatile memory, or optical disk drives and the like, maybe used in addition to or in place of the hardware depicted in FIGS. 1and 2. Also, the processes of the illustrative embodiments may beapplied to a multiprocessor data processing system, other than the SMPsystem mentioned previously, without departing from the spirit and scopeof the present invention.

Moreover, the data processing system 200 may take the form of any of anumber of different data processing systems including client computingdevices, server computing devices, a tablet computer, laptop computer,telephone or other communication device, a personal digital assistant(PDA), or the like. In some illustrative examples, data processingsystem 200 may be a portable computing device that is configured withflash memory to provide non-volatile memory for storing operating systemfiles and/or user-generated data, for example. Essentially, dataprocessing system 200 may be any known or later developed dataprocessing system without architectural limitation.

FIG. 3 is an example diagram illustrating an interaction of elements ofa healthcare cognitive system in accordance with one illustrativeembodiment. The example diagram of FIG. 3 depicts an implementation of ahealthcare cognitive system 300 that is configured to provide acognitive summary of EMR data for patients. However, it should beappreciated that this is only an example implementation and otherhealthcare operations may be implemented in other embodiments of thehealthcare cognitive system 300 without departing from the spirit andscope of the present invention.

Moreover, it should be appreciated that while FIG. 3 depicts the user306 as a human figure, the interactions with user 306 may be performedusing computing devices, medical equipment, and/or the like, such thatuser 306 may in fact be a computing device, e.g., a client computingdevice. For example, interactions between the user 306 and thehealthcare cognitive system 300 will be electronic via a user computingdevice (not shown), such as a client computing device 110 or 112 in FIG.1, communicating with the healthcare cognitive system 300 via one ormore data communication links and potentially one or more data networks.

As shown in FIG. 3, in accordance with one illustrative embodiment, theuser 306 submits a request 308 to the healthcare cognitive system 300,such as via a user interface on a client computing device that isconfigured to allow users to submit requests to the healthcare cognitivesystem 300 in a format that the healthcare cognitive system 300 canparse and process. The request 308 may include, or be accompanied with,information identifying patient attributes 318. These patient attributes318 may include, for example, an identifier of the patient 302, socialhistory, and demographic information about the patient, symptoms, andother pertinent information obtained from responses to questions orinformation obtained from medical equipment used to monitor or gatherdata about the condition of the patient. Any information about thepatient that may be relevant to a cognitive evaluation of the patient bythe healthcare cognitive system 300 may be included in the request 308and/or patient attributes 318.

The healthcare cognitive system 300 provides a cognitive system that isspecifically configured to perform an implementation specific healthcareoriented cognitive operation. In the depicted example, this healthcareoriented cognitive operation is directed to providing a cognitivesummary of EMR data 328 to the user 306 to assist the user 306 intreating the patient based on their reported symptoms and otherinformation gathered about the patient. The healthcare cognitive system300 operates on the request 308 and patient attributes 318 utilizinginformation gathered from the medical corpus and other source data 326,treatment guidance data 324, and the patient EMRs 322 associated withthe patient to generate cognitive summary 328. The cognitive summary 328may be presented in a ranked ordering with associated supportingevidence, obtained from the patient attributes 318 and data sources322-326, indicating the reasoning as to why portions of EMR data 322 arebeing provided.

Note that EMR data 322 or data presented to the user may come from homereadings or measurements that the patient makes available and arecollected into EMR data 322.

In accordance with the illustrative embodiments herein, the healthcarecognitive system 300 is augmented to include a repetitive portionidentification and weighting engine 320 for identifying repetitiveportions of clinical notes or portions of a patient EMR, analyzing thecontent of the repetitive portions, and weighting or ranking therepetitive portions for inclusion in a summary of the relevant portionsof the patient EMR. The repetitive portion identification and weightingengine 320 is described in further detail below.

FIG. 4 is a block diagram of a repetitive portion identification andweighting engine in accordance with an illustrative embodiment.Repetitive portion identification and weighting engine 400 comprisesrepetitive portion identification component 401, content analysiscomponent 402, repetitive portion weighting/ranking component 403, andcompetitive summary graphical user interface (GUI) generation component404. Repetitive portion identification and weighting engine 400 receiveselectronic medical record 411 containing clinical notes for a givenpatient.

Repetitive portion identification and weighting engine 400 also receivestemplates/patterns 412 of natural language content that typically existin a clinical note. In some embodiments, templates/patterns 412 may befor a particular doctor or medical professional and learned over timethrough analysis of his or her particular style of entry of informationinto clinical notes of patient EMRs.

Repetitive portion identification component 401 appliestemplates/patterns 412 to clinical notes in patient EMR 411 to identifyportions of the clinical notes that match templates/patterns within agiven tolerance. For example, repetitive portion identificationcomponent 401 may use fuzzy matching to match portions of clinical notesin EMR 411 to templates/patterns 412. Any matching portions of EMR 411are thus candidates for classification as repetitive content, indicatinga relatively lower level of importance for inclusion in summarization.Repetitive portion identification component 401 may identify candidaterepetitive portions using machine learning model 413.

Content analysis component 402 compares the subject matter or content ofthe candidate repetitive portions to content that is important to apatient's overall current medical condition (e.g., the diseases withwhich the patient has been diagnosed) and the treatments prescribed(e.g., particular medications, etc.) to determine if the repetitivecontent is pertinent to the patient's overall medical condition. Inaddition, content analysis component 402 performs a similar comparisonwith regard to the reason for the patient's current scheduled encounterwith the medical professional. For example, the patient may have beendiagnosed with Type 2 diabetes, but the current encounter is to look ata sprained ankle. In one embodiment, content analysis component 402 mayuse machine learning model 413 to compare the candidate repetitiveportions to the patient's overall condition or reason for the encounter.Repetitive portion weighting/ranking component 403 assigns weights tothe candidate repetitive portions based on the degree of relevance ofthe repetitive content to these contexts. Repetitive portionweighting/ranking component 403 may adjust the weights higher or lowerbased on trained machine learning model 413.

It should be appreciated that the repetitive content may be a portion ofa clinical note such that the evaluation of a single clinical note mayresult in different portions having different relative weights. Forexample, a doctor may copy an old entry in the patient's EMR and thenadd additional content to the particular encounter. As a result,repetitive portion weighting/ranking component 403 will weight morehighly the new content that was added and less highly the repetitivecontent.

Cognitive summary graphical user interface (GUI) generation component404 generates a GUI for a cognitive summary of the patient EMR 411 forthe medical professional's encounter with the patient. Cognitive summaryGUI generation component 404 uses the relative weightings of theportions of the clinical notes in EMR 411 to identify which portions ofthe clinical notes should be reflected in the cognitive summary GUI. Thecognitive system provides the summary GUI or dashboard representation ofthe patient's EMR, which is directed to the current medical conditionand treatment of the patient 411, anticipates the questions that thedoctor is likely to ask about the patient, and provides thecorresponding answers to the questions from the patient's EMR 411.Repetitive portion identification and weighting engine 400 providesunderlying functionality to facilitate the presentation of thiscognitive summary representation by identifying which portions of theclinical notes are more important than others that are consideredrepetitive.

The key aspects of the clinical notes can be augmented with additionalfeatures based on external resources 414, such as unlabeled data andexternal rule-based systems related to similar data. Because manuallabeling is very tedious and expensive, the availability of labeled dataand manually constructed features is limited. One illustrativeembodiment expands the training data features by automaticallydiscovering their correlations with features in the external resources414. Hence, aspects related to the patient from social media streams oraspects related to the patient diagnosis from research corpus of dataare identified and weighted to inform the medical professional to askclarifying questions, and comments added to the automatically identifiedfeatures will trigger the increase of the weight of those features inproportion to the number of patients that feature was pursued for.

FIG. 5 illustrates operation of augmenting key aspects of clinical noteswith additional features based on external resources in accordance withan illustrative embodiment. Segmented EMR data 501 is divided intolabeled clinical notes 503 and unlabeled clinical notes 504. Labeledclinical notes 503 can be clustered into cluster of label #1 505 tocluster of label #N 506, which are provided to feature construction 507,formatting features 510, and character features 511. Featureconstruction 507 generates keyword and distance features dictionaryvector 508.

External medical resources 502 provide features to keyword features 509and automatic feature expansion 512, which also receives keyword anddistance features dictionary vector 508, unlabeled clinical notes 504,and keyword features 509. Automatic feature expansion 512 generateskeyword features and weights 513, which are provided to model generator514. Keyword and distance features dictionary vector 508, keywordfeatures 509, formatting features 510, and characters features 511 arealso provided to model generator 514, which trains a machine learningmodel for augmenting key aspects of clinical notes with additionalfeatures based on external medical resources 502.

FIG. 6 is a flowchart illustrating operation of a mechanism forrepetitive portion identification and weighting in accordance with anillustrative embodiment. Operation begins (block 600), and the mechanismtrains a machine learning model for weighting/ranking repetitiveportions of electronic medical records (EMRs) (block 601). The mechanismapplies templates/patterns to clinical notes of patient EMR to identifyportions of the clinical notes that match (block 602).

The mechanism performs content analysis to determine whether thecandidate repetitive portions are relevant to the patient's overallmedical condition (block 603). The mechanism also performs contentanalysis to determine whether the candidate repetitive portions arerelevant to the reason for the scheduled encounter (block 604). Themechanism performs weighting/ranking of the candidate repetitiveportions based on the content analysis (block 605). Then, the mechanismgenerates a cognitive summary graphical user interface (GUI) includingat least a subset of the candidate repetitive portions (block 606).Thereafter, operation ends (block 607).

As noted above, it should be appreciated that the illustrativeembodiments may take the form of an entirely hardware embodiment, anentirely software embodiment or an embodiment containing both hardwareand software elements. In one example embodiment, the mechanisms of theillustrative embodiments are implemented in software or program code,which includes but is not limited to firmware, resident software,microcode, etc.

A data processing system suitable for storing and/or executing programcode will include at least one processor coupled directly or indirectlyto memory elements through a communication bus, such as a system bus,for example. The memory elements can include local memory employedduring actual execution of the program code, bulk storage, and cachememories which provide temporary storage of at least some program codein order to reduce the number of times code must be retrieved from bulkstorage during execution. The memory may be of various types including,but not limited to, ROM, PROM, EPROM, EEPROM, DRAM, SRAM, Flash memory,solid state memory, and the like.

Input/output or I/O devices (including but not limited to keyboards,displays, pointing devices, etc.) can be coupled to the system eitherdirectly or through intervening wired or wireless I/O interfaces and/orcontrollers, or the like. I/O devices may take many different formsother than conventional keyboards, displays, pointing devices, and thelike, such as for example communication devices coupled through wired orwireless connections including, but not limited to, smart phones, tabletcomputers, touch screen devices, voice recognition devices, and thelike. Any known or later developed I/O device is intended to be withinthe scope of the illustrative embodiments.

Network adapters may also be coupled to the system to enable the dataprocessing system to become coupled to other data processing systems orremote printers or storage devices through intervening private or publicnetworks. Modems, cable modems and Ethernet cards are just a few of thecurrently available types of network adapters for wired communications.Wireless communication based network adapters may also be utilizedincluding, but not limited to, 802.11 a/b/g/n wireless communicationadapters, Bluetooth wireless adapters, and the like. Any known or laterdeveloped network adapters are intended to be within the spirit andscope of the present invention.

The description of the present invention has been presented for purposesof illustration and description, and is not intended to be exhaustive orlimited to the invention in the form disclosed. Many modifications andvariations will be apparent to those of ordinary skill in the artwithout departing from the scope and spirit of the describedembodiments. The embodiment was chosen and described in order to bestexplain the principles of the invention, the practical application, andto enable others of ordinary skill in the art to understand theinvention for various embodiments with various modifications as aresuited to the particular use contemplated. The terminology used hereinwas chosen to best explain the principles of the embodiments, thepractical application or technical improvement over technologies foundin the marketplace, or to enable others of ordinary skill in the art tounderstand the embodiments disclosed herein.

1. A method, in a data processing system comprising a processor and amemory, the memory comprising instructions that are executed by theprocessor to specifically configure the processor to implement arepetitive portion identification and weighting engine, the methodcomprising: applying, by a repetitive portion identification componentexecuting within the repetitive portion identification and weightingengine, a plurality of templates to clinical notes of a patient EMR toidentify one or more candidate portions that match at least one of theplurality of templates; performing, by a content analysis componentexecuting within the repetitive portion identification and weightingengine, content analysis on the one or more candidate portions todetermine whether each given candidate portion is relevant; assigning,by a weighting component executing within the repetitive portionidentification and weighting engine, a relative weight to each givencandidate portion based on relevance using a trained machine learningmodel; generating, by a cognitive summary graphical user interface (GUI)generation component executing within the repetitive portionidentification and weighting engine, a cognitive summary reflecting atleast a subset of the one or more candidate portions of the patient EMR;and outputting the cognitive summary in a GUI to a user.
 2. The methodof claim 1, wherein performing the content analysis on the one or morecandidate portions comprises determining whether each given candidateportion is relevant to the patient's overall medical condition.
 3. Themethod of claim 1, wherein performing the content analysis on the one ormore candidate portions comprises determining whether each givencandidate portion is relevant to a reason for the patient's scheduledencounter with a medical professional.
 4. The method of claim 1, furthercomprising augmenting key aspects of clinical notes with additionalfeatures based on external resources.
 5. The method of claim 4, whereinaugmenting key aspects of clinical notes with additional features basedon external resources comprises: dividing the patient EMRs into labeledclinical notes and unlabeled clinical notes; and extracting keywordfeatures, formatting features, and character features from the labeledclinical notes.
 6. The method of claim 5, wherein augmenting key aspectsof clinical notes with additional features based on external resourcesfurther comprises generating a keyword and distance features dictionaryvector.
 7. The method of claim 6, wherein augmenting key aspects ofclinical notes with additional features based on external resourcesfurther comprises performing automatic feature expansion based on thekeyword features, the keyword and distance features dictionary vector,and the unlabeled clinical notes to generate keyword features andweights, the method further comprising training the machine learningmodel based on the keyword and distance features dictionary vector, thekeyword features and weights, the keyword features, the formattingfeatures, and the characters features.
 8. A computer program productcomprising a computer readable storage medium having a computer readableprogram stored therein, wherein the computer readable program, whenexecuted on a computing device, causes the computing device to implementa repetitive portion identification and weighting engine, wherein thecomputer readable program causes the computing device to: apply, by arepetitive portion identification component executing within therepetitive portion identification and weighting engine, a plurality oftemplates to clinical notes of a patient EMR to identify one or morecandidate portions that match at least one of the plurality oftemplates; perform, by a content analysis component executing within therepetitive portion identification and weighting engine, content analysison the one or more candidate portions to determine whether each givencandidate portion is relevant; assign, by a weighting componentexecuting within the repetitive portion identification and weightingengine, a relative weight to each given candidate portion based onrelevance using a trained machine learning model; generate, by acognitive summary graphical user interface (GUI) generation componentexecuting within the repetitive portion identification and weightingengine, a cognitive summary reflecting at least a subset of the one ormore candidate portions of the patient EMR; and output the cognitivesummary in a GUI to a user.
 9. The computer program product of claim 8,wherein performing the content analysis on the one or more candidateportions comprises determining whether each given candidate portion isrelevant to the patient's overall medical condition.
 10. The computerprogram product of claim 8, wherein performing the content analysis onthe one or more candidate portions comprises determining whether eachgiven candidate portion is relevant to a reason for the patient'sscheduled encounter with a medical professional.
 11. The computerprogram product of claim 8, wherein the computer readable program causesthe computing device to augment key aspects of clinical notes withadditional features based on external resources.
 12. The computerprogram product of claim 11, wherein augmenting key aspects of clinicalnotes with additional features based on external resources comprises:dividing the patient EMRs into labeled clinical notes and unlabeledclinical notes; and extracting keyword features, formatting features,and character features from the labeled clinical notes.
 13. The computerprogram product of claim 12, wherein augmenting key aspects of clinicalnotes with additional features based on external resources furthercomprises generating a keyword and distance features dictionary vector.14. The computer program product of claim 13, wherein augmenting keyaspects of clinical notes with additional features based on externalresources further comprises performing automatic feature expansion basedon the keyword features, the keyword and distance features dictionaryvector, and the unlabeled clinical notes to generate keyword featuresand weights, wherein the computer readable program further causes thecomputing device to train the machine learning model based on thekeyword and distance features dictionary vector, the keyword featuresand weights, the keyword features, the formatting features, and thecharacters features.
 15. An apparatus comprising: at least oneprocessor; and a memory coupled to the at least one processor, whereinthe memory comprises instructions which, when executed by the at leastone processor, cause the at least one processor to implement arepetitive portion identification and weighting engine, wherein theinstructions cause the at least one processor to: apply, by a repetitiveportion identification component executing within the repetitive portionidentification and weighting engine, a plurality of templates toclinical notes of a patient EMR to identify one or more candidateportions that match at least one of the plurality of templates; perform,by a content analysis component executing within the repetitive portionidentification and weighting engine, content analysis on the one or morecandidate portions to determine whether each given candidate portion isrelevant; assign, by a weighting component executing within therepetitive portion identification and weighting engine, a relativeweight to each given candidate portion based on relevance using gtrained machine learning model; generate, by a cognitive summarygraphical user interface (GUI) generation component executing within therepetitive portion identification and weighting engine, a cognitivesummary reflecting at least a subset of the one or more candidateportions of the patient EMR; and output the cognitive summary in a GUIto a user.
 16. The apparatus of claim 15, wherein performing the contentanalysis on the one or more candidate portions comprises determiningwhether each given candidate portion is relevant to the patient'soverall medical condition.
 17. The apparatus of claim 15, whereinperforming the content analysis on the one or more candidate portionscomprises determining whether each given candidate portion is relevantto a reason for the patient's scheduled encounter with a medicalprofessional.
 18. The apparatus of claim 15, wherein the computerreadable program causes the computing device to augment key aspects ofclinical notes with additional features based on external resources. 19.The apparatus of claim 18, wherein augmenting key aspects of clinicalnotes with additional features based on external resources comprises:dividing the patient EMRs into labeled clinical notes and unlabeledclinical notes; and extracting keyword features, formatting features,and character features from the labeled clinical notes.
 20. Theapparatus of claim 19, wherein augmenting key aspects of clinical noteswith additional features based on external resources further comprises:generating a keyword and distance features dictionary vector; andperforming automatic feature expansion based on the keyword features,the keyword and distance features dictionary vector, and the unlabeledclinical notes to generate keyword features and weights, theinstructions further cause the at least one processor to train themachine learning model based on the keyword and distance featuresdictionary vector, the keyword features and weights, the keywordfeatures, the formatting features, and the characters features.